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Art Unit: 2655 

DETAILED ACTION 
Response to Arguments 

1 . Applicant's arguments filed 8/26/2004 have been fully considered but they are 
not persuasive. The applicant argues to traverse prior art rejection based on limitations 
"searching a graph in which each path through the graph identifies a sequence of 
segments of the source utterances", "forming a first part of the graph that encodes a 
sequence of segments and a corresponding sequence of unit labels for each of the 
sources utterances" and ''forming a second part that encodes allowable transitions 
between segments of different utterances" However, Kuhn et al. (US 6029132) 
anticipates all the limitations listed above in that a plurality of paths in decision tree must 
be searched in order to generate n-best phoneme candidates (co/. 4, lines 30-57), 
Kuhn et al. also teach the step of forming a first part of the graph that encodes a 
sequence of segments and a corresponding sequence of unit labels for each of the 
sources utterances {Text-based Pronunciation Generator 16 in figure 1 assign phoneme 
candidates to each letter) and forming a second part that encodes allowable transitions 
between segments of different utterances {Phoneme-mixed Tree score estimator 20 in 
figure 1, determines the transition from one phoneme to another), 

2. Mohri is not relied upon for the teaching speech synthesis, but it is relied upon for 
the teaching of speech segments being associated with unit labels and each speech 
segment boundary being associated with a transition label. 
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Claim Rejections - 35 USC § 102 

3. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that 
form the basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(b) the invention was patented or described in a printed publication in this or a foreign country or in public 
use or on sale in this country, more than one year prior to the date of application for patent in the United 
States. 

4. Claims 1-13 and 18 are rejected under 35 U.S.C. 102(b) as being anticipated by 
Kuhn et al. (US Patent No. 6029132). 

1 . Regarding claims 1 and 18, Kuhn et al. disclose a method for selecting segments 
from a corpus of source utterances for synthesizing a target utterance (figure 1 ) and a 
software stored on a computer-readable medium for causing a computer to perform 
functions comprising selecting segments from a corpus of source utterances for 
synthesizing a target utterance {the operation of figure 1 can be implemented in 
software), wherein selecting the segments comprising: 

searching a graph in which each path through the graph identifies a sequence of 
segments of the corpus of source utterances and a corresponding sequence of unit 
labels that characterizes a pronunciation of a concatenation of that sequence of 
segments, each path being associated with a numerical score that characterizes a 
quality of the sequence of segment {referring to figures 1-3, Text-Based Pronunciation 
Generator 16 searclies tlie Decision Tree 10 to generate a list of pronunciation 18 
representing possible pronunciation candidates. The Phoneme-Mixed Tree Score 
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Estimator 20 searches the Phoneme-Mixed Decision Three 12 to access the viability of 
each pronunciation in list 18)\ 

wherein searching the graph includes matching a pronunciation of the target 
utterance to paths through the graph, and selecting segments for synthesizing the target 
utterance based on numerical scores of matching paths through the graph {col. 5, In, 
32-60), 

2. Regarding claim 2, Kuhn et al. further disclose that selecting segments for 
synthesizing the target utterance includes identifying a path through the graph that 
matches the pronunciation of the target utterance and selecting the sequence of 
segments that is identified by the determined path (coA 2, In, 66 to col, 3, In, 6 and coL 
5, In, 33-60), 

3. Regarding claims 3 and 4, Kuhn et al. further disclose that determining the path 
includes determining a best scoring path through the graph {col, 4, In, 30-38) and 
determining the best scoring path involves using a dynamic programming algorithm {col, 

4. In, 30-38), 

4. Regarding claim 5, Kuhn et al. further disclose a method for concatenating the 
selected sequence of segments to form a waveform representation of the target 
utterance {col, 6, In, 1-8), 
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5. Regarding claim 6, Kuhn et al. further disclose that selecting the segments for 
synthesizing the target utterance includes determining a plurality of paths through the 
graph that each matches the representation of the pronunciation of the target utterance 
{tables 18 and 22, each pronunciation represents a particular path), 

6. Regarding claim 7, Kuhn et al. further disclose that selecting the segments 
farther includes forming a plurality of sequences of segments, each associated with a 
different one of the plurality of paths {the result recorded in tables 18 and 22, with each 
pronunciation represents a particular path), 

7. Regarding claim 8. Kuhn et al. further disclose that selecting the segments 
further includes selecting one of the sequences of segments based on characteristics of 
those sequences of segments not determined by the corresponding sequences of unit 
labels associated with those sequences {coL 5, In, 33-60), 

8. Regarding claim 9, Kuhn et al. further disclose a method for forming a 
representation of a plurality of pronunciations of the target utterance {tables 18 and 22 
in figure 1), and wherein searching the graph includes matching any of the 
pronunciations of the target utterance to paths through the graph {the operation of 
elements 10, 16 and elements 20, 50 in figure 1), 
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9. Regarding claim 1 0, Kuhn et al. further disclose a method for forming a 

. representation of the pronunciation of the target utterance in terms of alternating unit 
labels and transitions labels (co/. 4, In. 30-38), 

1 0. Regarding claim 1 1 , Kuhn et al. further disclose that the graph includes a first 
part that encodes a sequence of segments and a corresponding sequence of unit labels 
for each of the source utterances (col. 3, In. 18-25), and a second part that encodes 
allowable transitions between segments of different source utterances and encodes a 
transition score for each of those transitions (col. 4, In. 30-38); and 

matching the pronunciation of the target utterance to paths through the graph 
includes considering paths in which each transition between segments of different 
source utterances identified by that path corresponds to a different subpath of that path 
that passes through the second part of the graph (co/. 4, In. 30-38, different possible 
combinations of pronunciations are constructed), 

1 1 . Regarding claim 12, Kuhn et al. further disclose selecting the segments for 
synthesis includes evaluating a score for each of the considered paths that is based on 
the transition scores associated with the subpaths through the second part of the graph 
(co/. 4, In. 30-38, evaluating by determining and selecting the n-best candidates), 

12. Regarding claim 13, Kuhn et al. further disclose that a size of the second part of 
the graph is substantially independent of a size of the source corpus, and a complexity 
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of matching the pronunciation through the graph grows less than linearly with the size of 
the corpus (nocye leaf in figures 2-3 represents the second part of the graph. Comparing 
the size of the node leaf with the whole corpuses 10 and 10, the size of the node leaf is 
substantially smaller than the corpuses. And thus, all possible combinations of 
pronunciations at a particular node leaf are much less than the size of the corpuses). 

Claim Rejections - 35 USC § 103 

The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

Claims 14-17 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Kuhn et al. (US Patent No. 6029132) in view of Mohri et al. (US Patent No. 6243679). 

13. Regarding claim 14, Kuhn et al. further disclose a method of claim 1 further 
comprising: providing the corpus of source utterances (col. 6, In. 1-7), forming the 
graph, including forming a first part of the graph that encodes a sequence of segments 
and a corresponding sequence of unit labels for each of the source utterances (co/. 3, 
In, 18-25), and forming a second part that encodes allowable transitions between 
segments of different source utterances and encodes a transition score for each of 
those transitions (co/. 4, In, 30-38, all possible combinations of pronunciations). 
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Kuhn et al. do not disclose that each source utterance being segmented into a 
sequence of segments, each consecutive pair of segments in a source utterance 
forming a segment boundary, and each speech segment being associated with a unit 
label and each segment boundary being associated with a transition label. 

However, Mohri et al. teach that each source utterance being segmented into a 
sequence of segments, each consecutive pair of segments in a source utterance 
forming a segment boundary (col. 4, In. 1-31), and each speech segment being 
associated with a unit label and each segment boundary being associated with a 
transition label (col. 4, In. 1-31). The advantage of using the teaching of Mohri et al. in 
Kuhn et al. is to interpret the input sequence more correctly. 

Since Kuhn et al. and Mohri et al. are analogous art because they are from the 
same field of endeavors, it would have been obvious to one of ordinary skill in the art at 
the time of invention to modify Kuhn et al. by incorporating the teaching of Mohri et al. in 
order to interpret the input sequence more correctly. 

14. Regarding claim 15, Kuhn et al. further disclose that forming the second part of 
the graph is performed independently of the utterances in the corpus of source 
utterances {referring to figures 2-3, all possible combinations of pronunciations is 
constructed with phonemes at that particular node leaf Thus, second part of the graph 
is independent of the corpus). 
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15. Regarding claim 16. Kuhn et al. further disclose a method of adding 
pronunciations to the corpus through the training phase (col. 6, In. 1-7), but do not 
specifically disclose a method for augmenting the corpus of source utterances with 
additional utterances; and augmenting the graph including augmenting the first part of 
the graph to encode the additional utterances, and linking the augmented first part to 
the second part without modifying the second part based on the additional utterances. 
However, it would have been obvious to one of ordinary skill in the art at the time of 
invention to readily recognize that one can add new pronunciations to the corpus 
through the training process. The new pronunciations would then be used to represent 
the pronunciations of the input sequence and all possible combinations of 
pronunciations at a particular node leaf. This enables the system to personalize the 
synthetic speech. 

16. Regarding claim 17, Kuhn et al. do not disclose that the graph is associated with 
a finite-state transducer which accepts input symbols that include unit labels and 
transition labels, and that produces identifiers of segments of the source utterances, 
and wherein searching the graph is equivalent to composing a finite-state transducer 
representation of a pronunciation of the target utterance with the finite-state transducer 
with which the graph is associated. 

However, Mohri et al. teach that the graph is associated with a finite-state 
transducer which accepts input symbols that include unit labels and transition labels, 
and that produces identifiers of segments of the source utterances (col. 10, In. 28 to col. 



Application/Control Number; 09/954,979 Page 10 

Art Unit: 2655 

1 1 , In. 67), and wherein searching the graph is equivalent to composing a finite-state 
transducer representation of a pronunciation of the target utterance with the finite-state 
transducer with which the graph is associated (col. 1 1 Jn. 31-67). The advantage of 
using the teaching of Mohri et al. in Kuhn et al. is to achieve time and space 
minimization efficiencies. 

Since Kuhn et al. and Mohri et al. are analogous art because they are from the 
same field of endeavors, it would have been obvious to one of ordinary skill in the art at 
the time of invention to modify Kuhn et al. by incorporating the teaching of Mohri et al. in 
order to achieve time and space minimization efficiencies (col. 1 , In. 60 to col. 2, In. 2). 

Conclusion 

THIS ACTION IS MADE FINAL Applicant is reminded of the extension of time 
policy as set forth in 37 CFR 1.136(a). 

A shortened statutory period for reply to this final action is set to expire THREE 
MONTHS from the mailing date of this action. In the event a first reply is filed within 
TWO MONTHS of the mailing date of this final action and the advisory action is not 
mailed until after the end of the THREE-MONTH shortened statutory period, then the 
shortened statutory period will expire on the date the advisory action is mailed, and any 
extension fee pursuant to 37 CFR 1 .136(a) will be calculated from the mailing date of 
the advisory action. In no event, however, will the statutory period for reply expire later 
than SIX MONTHS from the mailing date of this final action. 
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Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Huyen Vo whose telephone number is 703-305-8665. 
The examiner can normally be reached on M-F, 9-5:30. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Doris To can be reached on 703-305-4827. The fax phone number for the 
organization where this application or proceeding is assigned is 703-872-9306. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). 

Examiner Huyen X. Vo January 5, 2005 
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